Online Learning of Aggregate Knowledge about Non-linear 
Preferences Applied to Negotiating Prices and Bundles 
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ABSTRACT 

In this paper, we consider a form of multi-issue negotiation 
where a shop negotiates both the contents and the price 
of bundles of goods with his customers. We present some 
key insights about, as well as a procedure for, locating mu- 
tually beneficial alternatives to the bundle currently under 
negotiation. The essence of our approach lies in combining 
aggregate (anonymous) knowledge of customer preferences 
with current data about the ongoing negotiation process. 

The developed procedure either works with already obtained 
aggregate knowledge or, in the absence of such knowledge, 
learns the relevant information online. We conduct com- 
puter experiments with simulated customers that have non- 
linear preferences. We show how, for various types of cus- 
tomers, with distinct negotiation heuristics, our procedure 
(with and without the necessary aggregate knowledge) in- 
creases the speed with which deals are reached, as well as 
the number and the Pareto efficiency of the deals reached 
compared to a benchmark. 

1. INTRODUCTION 

Combining two or more items and selling them as one good, 
a practice called bundling, can be a very effective strategy 
for reducing the costs of producing, marketing, and selling 
products In addition, and maybe more importantly, 

bundling can stimulate demand for (other) goods or ser- 
vices (161 To stimulate demand by offering bundles of 
goods, requires knowledge of customer preferences. Tradi- 
tionally, firms first acquire such aggregate knowledge about 
customer preferences, for example through market research 
or sales data, and then use this knowledge to determine 
which bundle-price combinations they should offer. Espe- 
cially for online shops, an appealing alternative approach 



would be to negotiate bundle-price combinations with cus- 
tomers: in that case, aggregate knowledge can be used to 
facilitate an interactive search for the desired bundle and 
price. Due to the inherently interactive characteristics of 
negotiation, such an approach can very effectively adapt the 
configuration of a bundle to the preferences of a customer. A 
high degree of bundle customization can increase customer 
satisfaction, which may lead to an increase in the demand 
for future goods or services. 

In this paper, we present an approach that allows a shop 
to make use of aggregate knowledge about customer prefer- 
ences. Our procedure uses aggregate knowledge about many 
customers in bilateral negotiations of bundle-price combina- 
tions with individual customers. Negotiation concerns the 
selection of a subset from a collection of goods or services, 
viz. the bundle, together with a price for that bundle. Thus, 
the bundle configuration — an array of bits, representing the 
presence or absence of each of the shop's goods and services 
in the bundle — together with a price for the bundle, form 
the negotiation issues. In theory, this is just an instance of 
multi-issue negotiation. Like the work of ^3|Sl|71Eli our 
approach tries to benefit from the so-called win-win oppor- 
tunities offered by multi-issue negotiation, by finding mutu- 
ally beneficial alternative bundles during negotiations. The 
novelty of the approach, however, lies in the use of aggre- 
gate knowledge of customer preferences. We show that the 
bundle that represents the highest 'gains from trade' Pareto- 
dominates all other bundles within a certain collection of 
bundles. ^'^ Based on this important insight, we develop a 
procedure for combining aggregate knowledge of customer 
preferences with data about an ongoing negotiation process 
with 1 customer, to find alternative bundles that are likely 
to lead to high Pareto improvements. Note that due to the 
use of aggregate data only, our approach does not necessitate 
infringement of customers' privacy. 



^The gains from trade for a bundle are equal to the cus- 
tomer's 'valuation' of the bundle minus the shop's valuation 
of the bundle, which is his (minimum) price (cf. (T2| 'l. 
^An offer constitutes a Pareto improvement over another 
offer whenever it makes one bargainer better off without 
making the other worse off. A bundle b' 'Pareto-dominates' 
another bundle b whenever switching from b to b' results in 
a Pareto improvement (cf. |12|'l. 



The procedure we developed requires a process in the fore- 
ground and one in the background. The foreground pro- 
cess uses aggregate knowledge about customer preferences 
to recommend promising alternative bundles during ongo- 
ing negotiations with customers. Intuitively, the idea for 
the process in the foreground is that, whenever the shop de- 
cides to stop bargaining about a bundle b and to switch to 
an alternative bundle, he will choose from a 'neighborhood' 
of b, the bundle that looks promising in the sense that it has 
the highest conditionally expected gains from trade. The 
background process obtains the necessary aggregate knowl- 
edge about customer preferences. Based on this knowledge 
it estimates for each bundle the expected gains from trade, 
conditional on what the ongoing negotiation process reveals 
about the current customer. 

With respect to the background process we consider two 
cases. In the first case, we do not explicitly consider the 
background process: the shop already possesses the neces- 
sary aggregate knowledge. The shop may have obtained this 
aggregate knowledge by having access to expert knowledge 
or by collecting historical sales data and mining this data of- 
fline. The main purpose of this case is to highlight the value 
of the foreground process given sufficient aggregate knowl- 
edge, and to provide an upper bound for the second case. 
In the second case, we explicitly consider the background 
process: the shop does not have any a priori knowledge 
of customer preferences. Instead he learns about customer 
preferences online by interpreting individual customers' re- 
sponses to the shop's proposals for negotiating about alter- 
native bundles. This allows the shop to make progressively 
better estimations of the expected gains from trade. 

To ensure that bundling can stimulate demand for (other) 
goods or services we conduct computer experiments with 
simulated customers that have nonlinear preferences: i.e., a 
customer's valuation for a bundle of goods may be higher 
(or lower) than the sum of the customer's valuations for 
the individual goods. In our experiments, we consider the 
foreground process both with and without the aggregate 
knowledge already being available. In the absence of ag- 
gregate knowledge, the background process will learn the 
relevant information online. We show how, for various types 
of customers — with distinct negotiation heuristics — the fore- 
ground process (both with and without the necessary ag- 
gregate knowledge) increases the speed with which deals are 
reached, as well as the number and the Pareto efficiency 
of the deals reached compared to a benchmark. Moreover, 
through time, the performance of the foreground process 
without a priori information approaches the procedure that 
already possesses the necessary aggregate knowledge. 

The subproblem of just finding a good (or better) bundle 
configuration can be seen as a form of recommending |14) . 
if we do not consider the negotiation and pricing aspects. 
The general subject of bundling has received a lot of atten- 
tion recently, especially in the context of online information 
goods 1^ El Q IHl E]' The issue of finding the appropri- 
ate bundles is, however, not limited to information goods. 
It also occurs outside of the realm of information goods, 
where a number of aspects of a complex product can be 
selected, such as for a PC 0, a trip ][20j, or photography 
equipment fTSh Until now, this has not been considered as 



part of a negotiation process, to the best of our knowledge. 

For numerous real word applications — like the above exam- 
ples of selecting aspects of a complex product — the number 
of individual goods to be bundled, n, is relatively small. In 
this paper we will also only consider small values of n (say 
n < 10), for which aggregate knowledge still greatly facil- 
itates the process of finding attractive alternative bundles 
during a negotiation process. For example, with n — 10 
there are 2" — 1 = 1023 possible bundle configurations, so 
facilitating the search process among all those bundles is 
highly valuable. 

This paper builds on and significantly extends the idea, de- 
veloped in a preceding paper |18| . to negotiate over bundles 
and prices using aggregate knowledge. The scope of the 
earlier paper is limited to the foreground process; the nec- 
essary aggregate knowledge is assumed to be already avail- 
able. This approach is warranted because the paper focuses 
on additively separable preferences (i.e., a customer's valua- 
tion for a bundle is always equal to the sum of her valuations 
for the individual goods comprising the bundle). With addi- 
tive separability it suffices to learn the conditional expected 
gains from trade for the individual goods (cf. ^HDi which 
greatly simplifies the problem of learning the required aggre- 
gate knowledge. In this paper we consider non-linear cus- 
tomer preferences, for which learning the desired aggregate 
knowledge can be very difficult. For example, it may be very 
difficult to determine the conditionally expected gains from 
trade by collecting historical sales data and mining those 
data offline. It requires that the sales data reveals the cor- 
relation between customers' valuations for the various bun- 
dles. Such high quality data may not be readily available, 
especially when at the same time customers' privacy should 
be respected, as we assume in this paper. By interpreting 
customers' online responses to the shop's proposals for ne- 
gotiating about alternative bundles, our background process 
circumvents these difficulties. 

The next section provides a high-level overview of the inter- 
action model. In Section|H]we introduce relatively mild con- 
ditions on the customers' and the shop's preferences. Based 
on these conditions, Section|l]develops a procedure for find- 
ing the most promising alternative bundles. In order to test 
the performance of our system, we used it in interactions 
with simulated customers. In Section |^ we discuss how the 
necessary aggregate knowledge of customer preferences is 
learned online. Section^lpresents our computer experiments 
and discusses the results. Conclusions follow in Sectional 

2. OVERVIEW 

This section gives an overview of the interaction between the 
shop and the customer, as they try to negotiate an agree- 
ment about the price and the composition of a bundle of 
goods. The shop sells a total of n goods, each of which may 
be either absent or present in a bundle, so that there are 
2" — 1 distinct bundle-configurations containing at least 1 
good. In the current paper, we use n = 10. A negotiation 
concerns a bundle (configuration), together with a price for 
that bundle, and it is conducted in an alternating exchange 
of offers and counter offers I15| . typically initiated by the 
customer. An example of such a practice may involve the 
sales of bundles of news items in categories like politics, fi- 



nance, economy, sports, arts, etc. 

We develop a procedure that a shop can use to find mutuaUy 
beneficial alternative bundles during the negotiation about 
a given bundle, so that alternative bundles may be recom- 
mended whenever the negotiation about the given bundle 
stalls. Specifically, the procedure finds Pareto improvements 
by changing the bundle content.^ It uses information spe- 
cific to the current negotiation process as well as aggregate 
knowledge (obtained from the analysis of sales data, (anony- 
mous) data on previous and current negotiations, market 
research, or expert knowledge). The ongoing negotiation is 
analyzed to determine when an alternative bundle is needed, 
and both the ongoing negotiation process and the aggregate 
knowledge are used to assess which bundle to recommend. 

A customer can explicitly reject a suggested bundle by spec- 
ifying a counter offer with a different bundle content (e.g., 
the previous one), and she can implicitly reject a suggested 
bundle by offering a low price for it. In the current paper, 
only implicit rejection is allowed: customers only specify the 
bundle content for the opening offer, and thereafter only the 
shop can change the bundle content of an offer. This is to 
ease the description of our model and solutions. The possi- 
bility for customers to explicitly reject or change the bundle 
content can be easily incorporated in our model and solu- 
tions, however. 

Figure Q provides a high-level overview of the interaction 
between a shop and a customer. The shaded elements are 
part of the actual negotiation — the exchange of offers. The 
process starts with the customer indicating her interests, by 
specifying the bundle they will initially negotiate about. Af- 
ter that, they enter into a loop (indicated by the dotted line) 
which ends only when a deal is made, or with a 2% exoge- 
nous probability. (We do not model bargainers' impatience 
explicitly; therefore we need an exogenous stopping condi- 
tion, which specifies the chance of bargaining breakdown.) 
In the loop, the customer makes an offer for the current bun- 
dle b, indicating the price she wants to pay for it. The shop 
responds to this ofi'er either by accepting it, or by consid- 
ering a recommendation. In any case, conditional upon the 
98% continuation probability, the shop also makes an ofi^er, 
either for the current bundle b or for a recommended bundle 
b' (which then becomes the current bundle). 

In the model, the valuations of the customers and the shop 
are expressed as monetary values. The utilities of deals are 
expressed as strictly monotonic one-dimensional transforma- 
tions of valuations. In the simplest form, this would be the 
difference between the valuation of the bundle and the ne- 
gotiated price. The agents are interested in obtaining a deal 
with optimal utility ("net monetary value"). See Sectional 
for details. 

3. PREFERENCE MODEL 
3.1 Informal Discussion 

The essence of our model of valuations and preferences lies 
in the assumption that customers and the shop order bun- 
dles based on their net monetary value; the bundle with the 
highest net monetary value is the most preferred bundle. A 
customer's net monetary value of a bundle is equal to the 
customer's valuation of the bundle (expressed in money) mi- 



nus the bundle price and the shop's net monetary value is 
equal to the bundle price minus the shop's bundle valuation 
(also expressed in money). 

Given the above assumption and the assumption that a cus- 
tomer wants to buy at most one bundle (within a given time 
period) , Section HS . 2l shows that any deal involving the bundle 
with the highest gains from trade is Pareto efficient. We can 
now specify which is the best bundle for the shop to advise: 
faced with the problem of recommending one bundle out of 
a collection of bundles, the "best" bundle to recommend is 
the bundle with the highest expected gains from trade; this 
bundle Pareto dominates all other bundles. ("Section l3.2l can 
be skipped upon first reading.) 

3.2 Formal Discussion 

Before being able to more formally state the results, some 
notation is necessary. Let N G fi, with n = |A'^|, de- 
note the collection of n individual goods and 2^ denote the 
power set of A'^ (i.e., the collection of all subsets of n), then 
B — 2^^ \ {0} denotes the collection of all possible bundles. 
Furthermore, let P = R denote the collection of all possi- 
ble bundle prices.^ The customer and the shop attach the 
monetary values of Vc{b) and Vs{b), respectively, to a bundle 
be B (with Vc{b),Vs{b) € P). The function Xj : B x P 
with j G {c, s} denotes the net monetary value for bundle 
b at price p: Xc{b,p) = Vc{b) — p and Xs{b,p) — p ~ Vs{b) 
denote the customer's and the shop's net monetary values, 
respectively. 

We assume that the customer's and the shop's utility for 
consuming bundle b at price p, denoted by Uj {b, p) with j € 
{c, s}, can be expressed as the composition function gj o 
Xj{b,p) with gj : R R. For gj we assume that '^^^^^^ > 
for all a; £ R. Thus we have that Uj{b,p) = gj{xj{b,p)) 
and since Qj is a strictly increasing function, we can assume 
without loss of generality that Uj(b,p) — Xj{b,p) (cf. |12|'l. 

Given the customer's and shop's monetary values, we define 
a useful subset B* of B as follows: B* = arg max^gs (f c(6) — 
Vs{b)), that is, B* represents the collection of bundles with 
the highest possible gains from trade (across all possible 
bundles). We are now ready to introduce the following 
proposition. 

Proposition I. A deal {b,p) with b e B and p e P is 
Pareto efficient if and only if b e B* . 

Remark I. A deal {b,p) is Pareto efficient if there is no 
{b',p') such that Uj{b,p) < Uj{b',p') for all j £ {c, s} and 
the inequality is strict for at least one j. 

Proposition Q means that a deal is Pareto efficient if and 
only if it entails a bundle with the highest possible gains 
from trade. For the proof of this proposition the following 
lemma is very useful. 

^Negative prices may not be realistic, but we want to make 
as few behavioral assumptions as possible. For the results 
the possibility of negative prices is not problematic (see 
Footnote 0. 




Figure 1: A flowchart describing the integration of recommendation in a shop and a customer's alternating 
exchange of offers and counter offers. 



Lemma 1. For any two deals {b* ,p*) and {b,p) withp* ,p £ 
P, h* £ B* , and b £ B \ B* we have Xc{b,p) < Xc{b* ,p*) or 
Xs{b,p) < Xs{b*,p*). 

Proof. We prove the above lemma by contradiction. Sup- 
pose that for any b* G B* and b € B\B* we have Xc(b,p) > 
Xc{b*,p*) and Xs{b,p) > Xs{b* ,p*). A necessary condition 
for this to hold is that Vc{b) — Vs{b) > vdp*) — Vs{b*). How- 
ever, b* £ B* and b £ B \ B* means, by definition of B* , 
that Vc{b) - Vs{b) < Vc{b*) - Vsib"). □ 

We are now ready to prove Proposition 

Proof. 1. (If) Pick any j € {c, s}. Suppose that j's 
position improves by moving from any deal {b,p) with 
6 e B* to {b\p'), that is, Uj{b,p) < Uj{b' ,p'). It then 
suffices to show that the opponent denoted by f will 
always be made worse off, that is, u^i {b,p) > Uji (b' ,p). 
From the properties of gj and Qji it follows that a bar- 
gainer's position improves/ worsens whenever the net 
monetary value increases/decreases. Since j''s position 
improves, it follows from Lemma Q that j' is made 
worse off whenever b £ B\B* . Moreover, \i b,b' £ B* 
then the gains from trade remain unchanged, hence j' 
is made worse off. 

2. (Only if) We will prove this part by contradiction. 
Suppose that b ^ B* with the price being any p £ P. 
Pick any b' £ B* and set the bundle price to p' — 
p + Vs{b') — Vs{b), so that p' — Vs{b') = p ~ Vs{b). It 
foUows from p £ P that p' £ P (recall that P = R)" 
and the properties of Qs that the shop is indifferent 
between the deals {b,p) and {b',p'). Also, it follows 
from Lemma and the properties of that the cus- 
tomer is made better off. That is, any b' £ B* Pareto 
dominates b ^ B* . Thus b ^ B* cannot be a Pareto 
efficient solution. 

*If we choose to a priori rule out p < and Vj{b) < (for 
j £ {c, s} and all b £ B), thenp > Vs{b) should hold because 
otherwise the shop will not be willing to sell the bundle in 
the first place. Consequently, p' £ P still holds. 



□ 

4. THE FOREGROUND PROCESS 

The idea is to develop a mechanism for a shop to find Pareto 
improvements by changing the bundle content during a ne- 
gotiation. The mechanism we propose contains two subpro- 
cedures. The first procedure monitors the negotiation pro- 
cess and tells the shop when to recommend, at which time 
the second procedure tells the shop what to recommend, by 
generating recommendations based on aggregate knowledge 
and the ongoing negotiation process. FigureQshows the in- 
teraction between these two procedures; they are discussed 
in more detail in Sections 14. II and 14.21 respectively. 

4.1 Deciding When to Recommend 

The shop needs a procedure for deciding when he should 
recommend negotiating about a different bundle. The ob- 
vious input for this decision is the progress of the current 
negotiation process, which can be described as a sequence of 
offers and counteroffers. An offer O contains a bundle defi- 
nition and a price: O = {b,p) with b £ B and p £ P. [B and 
P denote the collections of all possible bundles and prices, 
respectively.) Let h = (0(1), 0(2), . . . , 0{k)) denote a finite 
history of offers (k is a natural number), where 0{i + 1) is 
the counter offer for 0{i), for all i < k. Furthermore, let 
H denote the universe of all possible finite offer sequences 
(thus h £ H). The problem of when to advise can now be 
specified as the mapping / : iif i— > {now, not now}, where 
"(not) now" means: (don't) recommend a new bundle now. 

We construct a heuristic for / based on the assumption that 
there is a probability of not reaching a deal with a customer 
(e.g., a break off, endless repetition, or deadline): the longer 
the negotiation is expected to take, the less likely a deal is 
expected to become. Furthermore, as a deal becomes less 
likely, the incentive for the shop to recommend negotiation 
about an alternative deal should increase. Given the shop's 
bargaining strategy, our heuristic then extrapolates the time 
the current negotiation process will need to reach a deal, 
from the pace with which the customer is currently giving 
in. More precisely, if we let O = {b,p) and O' = {b,p') 
denote the customer's current and previous offers for bundle 



b, then At, the predicted remaining number of negotiation 
rounds necessary to reach a deal, is defined as follows: 



advantage of advising bundles within the neighborhood of b 
is that the advice is less likely to appear haphazard. 



where Vs{b) denotes the shop's monetary value for bundle 
b. The higher At, the higher the likelihood of a recommen- 
dation. Specifically, the probability of a recommendation 
depends on At as follows: 

^^recommendation 1 6^P( 0.25At), 

which means that the probability that the shop recommends 
an alternative bundle approaches 1 as At increases. 

4.2 Deciding What to Recommend 

Our mechanism combines aggregate knowledge (obtained 
from the analysis of sales data, (anonymous) data on pre- 
vious and current negotiations, market research, or expert 
knowledge) with data about the ongoing bargain process, 
to recommend bundles to customers while negotiating with 
them. Suppose, for example, that a customer off'ers to buy 
a bundle fe at a price p. Whenever a recommendation is 
needed (see Section r4.1ll the idea is to select from within the 
"neighborhood" of bundle b, the bundle b' that maximizes 
E[vc{b') — Vs{b')\vc{b) > p]: the expected gains from trade 
for bundle b' , given that the customer is willing to pay at 
least p for bundle b. (To simplify notation we will write 
E[-\b] instead of E[-\vc{b) > p].) Since the shop knows its 
own monetary value for bundle b' , Vs{b'), the aim is really 
to maximize E[vc{b')\b] — Vs{b'). The difficulty here lies in 
estimating the customer's expected valuation of bundle b': 

i5bc(&')l&] =E*-P^("-(^') (2) 

where pr{vc{b') ~ i\b) denotes the probability that the cus- 
tomer's valuation for bundle b' is equal to i, given that she 
is willing to pay at least p for bundle b. In Section |3 we 
propose an online learning mechanism for determining this 
estimation (the background process mentioned in Section^ 
It is, however, instructive to first discuss the recommenda- 
tion mechanism in some more detail (i.e., assuming that the 
expectations are already known). 

A customer initiates the negotiation process by proposing an 
initial bundle and offering an opening price: let 0(0) = {b,p) 
denote the customer's opening offer (with b £ B and p £ P). 
The shop stores the bundle proposed by the customer as 
(his assessment or estimation of) the customer's "interest 
bundle," in the neighborhood of which the shop searches 
for promising alternative bundles to recommend if, at any 
time, the shop decides he should make a recommendation 
(see Section f4.H . This neighborhood of bundle b, Ng{b), is 
defined as follows. 

Ng{b) = {&' G B : fe' C 6 and |b'| + 1 = |b| 

or fe' D b and |b'| - 1 = |fe|}, (3) 

In other words, Ng{b) contains the bundles which, in binary 
representation, have a Hamming distance to b of 1."" The 

^Remember that each bundle can be represented as a string 
containing n bits indicating the presence or absence in the 
bundle, of each of the shop's n goods. 



Having defined a bundle's neighborhood, let the ordered set 
A denote the so-called "recommendation set," obtained by 
ordering the neighborhood Ng{b) on the basis of the esti- 
mated expected gains from trade of all the bundles b' in 
bundle b's neighborhood, _E[«c(fe')|&] — Vs{b'), where E de- 
notes the estimation of E. 

To recommend a bundle bk (the fc"* recommendation, with 
k > 1), our mechanism removes the first bundle from A, 
adds a price to it and proposes it as part of the shop's next 
offer. Depending on the customer's counter offer for bundle 
bk, the current advice set may be replaced: if the customer's 
response is very promising (to be defined below) A will be 
emptied, bundle bk will be taken as the customer's new in- 
terest bundle (in the neighborhood of which the search con- 
tinues) , and the bundles in the neighborhood of bk are added 
to A. 

To specify this in more detail, let Of denote the sequence 
of offers placed by the customer up until time t, and let 
max{Ot) specify the customer's past offer with the high- 
est net monetary value from the shop's perspective. Then 
the shop will determine the impact of the fc"* recommenda- 
tion by comparing the net monetary value of the customer's 
current offer 0(t -I- 1) with that of offer max{Ot ). For this 
purpose, the shop uses the function sigUf^y : RxR {0, 1}. 
If we let maxiOt) = {b,p) and the customer's current offer 
0{t+l) = {b',p'), then 

, / 1 lip -Vs{b')> p-Vs{b) , > 

''9n,AP,P) = [ otherwise ' 

If sigUfj t'iP'P') ~ Ij then the shop's assessment of the cus- 
tomer's interest bundle is updated to be bk'. the customer's 
response is promising enough to divert the search towards 
the neighborhood of bk . That is, the first element of A be- 
comes the bundle b' G Ng{bk) with the maximum difference 
E[vc{b')\bk] — Vsip'), the second element of A becomes the 
bundle b" with the second highest difference E[vc{b")\bk\ — 
Vsib"), and so on. 

In case sign^y{p,p') — 0, the shop will make the next 
recommendation. Before the shop makes the next recom- 
mendation however, he checks if the negotiation is currently 
about the interest bundle. If this is not the case he will first 
make an offer containing the interest bundle. Whenever this 
offer is not accepted by the customer the shop will make the 
next recommendation in the following round. Consequently 
we have the property that a recommendation is always pre- 
ceded by an offer containing the interest bundle. (We will 
see in Section fS.ll that this is a very useful property.) 

5. THE BACKGROUND PROCESS 

The ordering of all bundles in the neighborhood of an inter- 
est bundle b constitutes a crucial aspect of the recommen- 
dation mechanism described in Section 14.21 Ideally, given 
an interest bundle b, the first bundle in the ordering has 
the highest expected gains from trade, the second bundle 
in the ordering has the second highest expected gain from 
trade, and so on so forth. So, as explained in Section [4.21 



we are interested in knowing _E[iic(&')|&] — 113(6') for all bun- 
dles 6' within the neighborhood of bundle b, where the dif- 
ficulty lies in estimating the customer's expected valuation 
E[vc{b')\b]. Expert knowledge may provide the shop with 
these estimations, but unfortunately such knowledge is of- 
ten not available. We will therefore introduce an effective 
approach for online learning to "correctly" order the bundles 
in the neighborhood of the interest bundle. 

5.1 Using Bargaining Data 

To order the bundles, the shop uses data on the current and 
past bargaining processes. More precisely, suppose the shop 
advises b' given an interest bundle b with the most recent 
customer offer of O = {b, p) and that the customer responds 
with the counter offer O' = {b' ,p). The shop then feeds the 
triples < b,b' ,p' — p > and < b' ,b,p — p' > as new training 
examples to an online learning mechanism. 

The recommendation mechanism described in Section 14.21 
ensures that the customer's offers O = {b,p) and O — {b',p) 
are placed directly after one another; thus, as long as the 
customer's strategic misrepresentation of the underlying bun- 
dle values do not jump around too much from one trading 
period to the next, the misrepresentation in p and p' will 
roughly cancel each other out. Consequently, p — p' will be 
a good indication of the difference in a customer's valuations 
of bundles b and b' , Vc{b) — Vc{b'). (Similarly, p ~ p will be 
good estimation of Vc{b') — Vc{b).) 

Based on these training examples the learning mechanism, 
when given < b,b' > combined with the shop's valuations 
for the two bundles, Ws(6) and Vs{b'), predicts Agt{b',b) = 
E\p' — Vs{b') — {p — Vs{b))\b]: the expected difference in gains 
from trade, resulting from changing from bargaining about 
bundle b to bargaining about bundle b' (given that a cus- 
tomer expressed an interest in bundle b, as assumed above). 
To sort bundles in the vicinity of an interest bundle b ac- 
cording to their expected gains from trade, it suffices to sort 
the bundles according to Agt{b',b). 

5.2 Complexity Issues 

Knowledge of the correlation between the values of the vari- 
ous bundle pairs is essential for correctly learning to order all 
bundles in the vicinity of an "interest" bundle b. Given that 
the shop sells n individual goods, there are 2" — 1 possible 
bundles containing at least 1 good. Learning the correlation 
between all bundle pairs requires — worst case — comparing 
an order of (2")' bundle pairs. Clearly, for particular in- 
stances of the problem the complexity may be reduced sig- 
nificantly. Take for instance the situation, where the cus- 
tomer's valuation for a bundle is always equal to the sum of 
her valuations for the individual goods comprising the bun- 
dle. In that case it suffices to compare n individual goods 
with 2" bundles, reducing the complexity — worst case — to 
an order of n ■ 2" . In this paper we focus on the more gen- 
eral case, where a customer's bundle valuation may not be 
equal to the sum of her valuations for the individual goods 
comprising the bundle. 

For this more general customer preference setting, search- 
ing in the neighborhood of the interest bundle has two ad- 
vantages: besides making an advice less likely to appear 
haphazard, it significantly reduces the number of bundle 



pairs that need to be considered. Recall that we defined the 
neighborhood of a bundle b as consisting of all bundles at 
a Hamming distance of 1 from bundle b (see Section l4.2t . 
It then requires comparing n • (n — 1) bundles when the 
interest bundle has size 1, Q) ■ {n ~ 2) additional bundle 
pairs when the interest bundle size is 2, and so on. (Q) 
Denotes the binomial coefficient.) Thus — worst case — there 
are ^^^-^(^)-(n— fc) < (n- 2") bundle comparisons necessary, 
which is significantly less than {2")^. 

5.3 Online Learning Mechanism 

In this paper we consider only the situation where the num- 
ber of individual goods to be bundled is relatively small: 
i.e., n < 10. (With n = 10 there are 2" - 1 = 1023 possi- 
ble bundle configurations, so facilitating the search process 
among all those bundles is still highly valuable.) Since we 
only consider bundles within the neighborhood of an inter- 
est bundle, it is tractable, for relatively small values of n, 
to explicitly estimate the required conditional expectations 
online. Moreover, bargaining with one customer generally 
creates numerous training examples which can be used twice 
(i.e., < b,b',p' —p> and < b' ,b,p — p' > are both stored 
as separate training examples, see Section 15.111 . For small 
values of n therefore, the learning mechanism can improve 
its estimation of the conditional expectations, even given 
relatively few customers who provide training examples. 

Given the k training examples << b,b' ,pi — p'l >,...,< 
b,b',pk ~ p'k >>, the online learning algorithm simply esti- 
mates Agt{b' , b), the expected difi'erence in gains from trade, 
resulting from changing from bargaining about bundle b to 
bargaining about bundle b' (given that a customer expressed 
an interest in bundle b), as the average of the training ex- 
amples, i.e., 

k 

Afft(6',6) = i^(p.-p:). (5) 

i — l 

The danger of using Agt{b',b) directly to sort the bundles 
in the neighborhood of the interest bundle b, is that the 
diversity of the trading example remains limited. Conse- 
quently, learning the correct ordering of the bundles is not 
possible. To allow for sufficient exploration the shop chooses 
with a probability p{b, b' , M°) (with M° = Ng{b)) a bundle 
b' G M'^ to be first in the ordering of Ng{b); once the first 
bundle in the ordering is determined, say b* , with a proba- 
bility p(6, 6', M^) (with = Ng{b)\{b*}) the shop chooses 
a bundle 6 G to be second in the ordering, and so on. 
The probability p{b,b',M) (with M C Ng{b)) is computed 
according to the softmax action selection rule (cf. i.e., 

\-Agt{b',b) 

p{b,b',M) = . (6) 

where A determines the exploratory behavior of the mecha- 
nism. The greater A, the less exploration will take place, i.e., 
the higher the probability that the bundle with the highest 
expected gains from trade will be picked first, the second 
highest second, and so on. Initially A is very small; it in- 
creases over time. 

6. NUMERICAL EXPERIMENTS 



In order to test the performance of our proposed mechanism, 
we implemented it computationally, and tested it against 
many simulated customers. Valuations for the shop and 
the customers were drawn from random distributions. First 
we describe customer preferences and how we implemented 
negotiations in the simulation, and then we present our ex- 
perimental design and simulation results. 

6.1 Customer preferences 

The goods may be complementary, in which case the valua- 
tion of a bundle is higher than the sum of the valuations of 
the individual goods in the bundle. We model the possibility 
of complementarities by representing Vc{b), the customer's 
valuation for a bundle b, as a (cubic) polynomial. If we let 
A'' denote the collection of n individual goods and the vector 
X = (a;(l), . . . , x{n)) the binary representation of a bundle b 
(i.e., x{i) = 1 if and only if i £ &), then 

Vc(b) = ao + ai ■ x{i) + ^ Uij ■ x{i) ■ x(j) + 

^ aijk ■ x{i) ■ x(j) ■ x{j), (7) 

where ao, ai, Uij, and aijk (for i,j,k G A'^) denote the con- 
stant, linear, quadratic, and cubic coefficients of the poly- 
nomial, respectively. The quadratic and cubic coefficients 
determine the extent to which complementarities exist be- 
tween two and three goods. (Customers buy at most one 
instance of an individual good, hence we can ignore the pos- 
sibility of complementarities between identical goods: i.e., 
ciii ~ am ~ for all i £ N.) 

An individual customer's values for the various coefficients 
are randomly distributed. If a denotes an arbitrary instance 
of all these coefficients, then a gives rise to a multivari- 
ate normal distribution, i.e., pr{a) ~ A'[/I, S], where the 
vector /I and the matrix S = [(Jij] denote the means and 
(co)variances of the distribution. 

From EquationQ (and the fact that x{i) £ {0, 1}) it follows 
that we can obtain all bundle valuations {vc{bi), . . . , Uc(&2'»-i)) 
by applying a linear transformation on a. Consequently, the 
corresponding probabilities pr{vc{bi), . . . , i;c(&2"-i)) also form 
a multivariate normal distribution 9 . That is, we have 

pr(«e(&i),...,«,(62..-i)) ~iV[T/I,TST'], (8) 

where the matrix T specifies the linear transformation (the 
j*'' element in the i*'' row of T specifies whether or not the 
corresponding z*'' coefficient in a should contribute to the 
valuation of the i*'' bundle). 

6.2 Modeling Negotiations 

6. 2. 1 Time-dependent Strategy 

For the customer (shop), the time-dependent bidding strat- 
egy is monotonically increasing (decreasing) in both the 
number of bidding rounds {t) and her (his) valuation. In 
particular, a bidding strategy is characterized by the gap 
the customer leaves between her initial offer and her valua- 
tion, and by the speed with which she closes this gap. The 
gap is specified as a fraction of the bundle valuation and it 
decreases over time as gap{t) — gapmit ■ exp{—St), so over 
time, she approaches the valuation of the bundle she is cur- 
rently negotiating about. Note that changes in the gap are 



time-dependent, but not bundle-dependent! This strategy 
is therefore called "time-dependent-fraction" (tdf). Almost 
the same holds for the shop's bidding strategy, mutatis mu- 
tandis. The initial gap, gapinit, is set at 0.5 for the customer 
and at 1.5 for the shop, and we fix 5 = 0.03 for the shop 
as well as the customer, in order to reduce the number of 
jointly fiuctuating parameters somewhat. Summing up, the 
customer (shop) starts her (his) bidding for a bundle at (one 
and a) half her (his) valuation, and her (his) bids gradually 
approach her (his) valuation. 

6.2.2 Tit-for-Tat Strategy 

The time-dependent strategy described above generates bids 
irrespective of what the opponent does. As an example of 
a strategy that responds to the opponent, we implemented 
a variant of tit-for-tat (tft) The initial 'move' is al- 
ready specified by gapina like in the TDF-strategy. If in sub- 
sequent moves the utility level of the opponent's offer im- 
proves, then the same amount is conceded by the bargainer. 
Note that it is the increment in the utility level perceived 
by the bargainer (not the opponent). Furthermore, this per- 
ceived utility improvement can also be negative. To make 
the bidding behavior less chaotic, no negative concessions 
are made. That is, we used a so-called monotone version 
called tit-for-tat-monotone-fraction (tftm) which can never 
generate a bid with a worse utility than the previous bid. 

6.3 Experimental Setup 

In the computer experiments reported in this paper, we 
compare our new approach of having no a priori informa- 
tion and learning customer preferences online (as discussed 
in Section 15.111 . to the one where — for example because of 
expert knowledge — the shop already knows the underlying 
joint probability distribution of all bundle valuations (see 
Section [6. in . The latter approach is also the one discussed 
and experimented with in more detail in In this ap- 

proach, the shop directly derives the value of E\vc{b')\h\ for 
each bundle pair; based on these values the shop computes 
the expected gains from trade and orders the bundles in the 
neighborhood of b accordingly. 

Besides comparing our new online procedure (referred to as 
/i) with the previous method (called S), we also assess the 
relative performance of the system by performing the same 
series of experiments with a benchmark procedure (called 
B), which simply recommends a random bundle from the 
current bundle's neighborhood. That is, the benchmark 
does not base the order in which it recommends the next 
bundle on the estimated expected gains from trade like our 
system does. 

In the experiments, the shop's bundle valuations are de- 
termined by applying a nonlinear bundle reduction. This 
means that the bundle price is generally less than the sum 
of the individual goods comprising the bundle. In order to 
prevent the trivial problem of customers wanting to buy all 
goods, the bundle reduction becomes for bundles which 
contain more than 3 goods. 

There are 10 individual goods. We randomly generate the 
underlying probability density function pr{a). In order to 
ensure sufficient difference in valuations between customers, 
however, we fix the correlation matrix (but not the covari- 



Table 1: Comparison of the different methods /i, 5* and the benchmark B. Figures are averages across 10 runs 
with different random seeds, and 12000 customers per run. Standard deviations are given between brackets. 





Methods 


Performance 
Indicator 


TDF TFTM 


S 

TDF TFTM 


B 

TDF TFTM 


max. gains 
min. gains 
gains foinit 


1202.81(5.34) 
-1023.61(54.77) 
438.70(15.45) 


gains feint 
gains feflnai 
percentage 
rel. percentage 
rounds 
deals 


763.55(15.75) 549.27(4.23) 
863.33(4.57) 797.45(4.40) 
0.85(0.00) 0.82(0.00) 
0.52(0.01) 0.43(0.01) 
8.24(0.60) 5.16(0.16) 
10314.20(143) 11024.50(39) 


826.18(16.30) 573.14(5.81) 
939.72(8.60) 875.98(7.57) 
0.88(0.00) 0.85(0.00) 
0.61(0.01) 0.54(0.01) 
8.01(0.65) 4.71(0.15) 
10340.60(156) 11114.20(40) 


697.74(14.19) 527.55(11.61) 
777.82(8.59) 727.81(5.96) 
0.81(0.00) 0.79(0.00) 
0.41(0.01) 0.34(0.01) 
13.71(0.93) 7.37(0.29) 
9171.80(181) 10496.50(71) 
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Figure 2: Relative performance of the /i-system (on the left), measured by calculating the difference in gains 
from trade between the bundles bfinai and binit, as a percentage of the difference between the maximum and 
the initial gains from trade. The shop uses the TDF strategy with 5 = 0.03, and the customers use either the 
TDF strategy (with S = 0.03), or the TFTM strategy (with 5 = 1), as described in section IHT2I The graph 
on the right gives the difference in performance between /j, and 5*. (Both graphs actually show the 100-step 
moving averages.) 



ance matrix). We randomly initialize the covariance matrix 
such that we can partition A'' in 3 subsets (2 of size 3 and 
1 of size 4). Selling a customer one of these 3 subsets will 
often generate the highest gain from trade. (Roughly be- 
tween 20 and 40% of the time this is the case, in the other 
cases 1 or 2 and sometimes more goods need to be added or 
removed.) To test the robustness of our procedure to quanti- 
tative changes in the underlying distributions we conducted 
a series of experiments with 10 different distributions. For 
each of these settings we simulated negotiations between the 
shop, with randomly drawn valuations, which were kept con- 
stant across negotiations with 12, 000 customers, each with 
her valuations drawn randomly from the particular distri- 
bution used. To further test the robustness of our system, 
each customer was simulated using 2 different negotiation 
strategies, as described in Section 16.21 (The shop always 
uses the TDF strategy.) 

To allow initiation of the negotiation process by the cus- 
tomer, we assume that the customer starts negotiating about 
an initial bundle binit- In order to give the shop some room 



for improvement, we initialize the customer's initial bundle 
by randomly selecting a bundle b which, in binary represen- 
tation, has a Hamming distance of 3 to the bundle 6* that 
is associated with the highest gains from trade. 



6.4 Results 

The overall results of our experiments are listed in Table Q 
The numbers are averages over 12, 000 customers drawn 
from each distribution of valuations, and over 10 differ- 
ent randomly generated distributions; standard deviations 
(across averages from the 10 distributions) are listed be- 
tween brackets. The maximum and minimum attainable 
gains from trade are determined by the current random 
distribution of valuations; they do not depend on the cho- 
sen strategy and method. Likewise, the bundle a customer 
wants to start negotiating about does not depend on the 
chosen strategy and method. Therefore, the average of these 
figures represented in the first 3 rows are identical across all 
experiments — for each shop-customer interaction these fig- 
ures are known even before the negotiation commences. 
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Figure 3: The cumulative average number of deals per customer, as attained by the ^-system (on the left) 
and the 100-step moving average of the number of rounds required to reach those deals (on the right). As in 
Figure [2! the shop uses the TDF strategy and the customers use either the TDF or the TFTM strategy. The 
shop manages to reach deals with 11,381 of the TDF-customers, and with 11,554 of the TFTM-customers. 



The remainder of the results is measured at the end of each 
shop-customer interaction, and subsequently averaged over 



all 10 ■ 12, 000 customers. The row for 'gains &; 



nt(crcst) 



shows 



the gains from trade associated with the bundle of which 
the shop, at the end of the negotiation with each customer, 
thinks the customer is most interested in. This estimation 
is most accurately performed by the S-system, which is not 
surprising since it has direct access to the distribution un- 
derlying the customer's preferences — even though it does 
not know each individual customer's actual preferences. But 
the /i-system, that has to do without this a priori knowledge 
altogether, and instead has to learn about its customers' 
preferences online, does not do much worse, especially when 
compared to the benchmark system, B, in the rightmost 
columns. 

The row labeled 'gains foflnai' gives the gains from trade as- 
sociated with the bundle that the shop and the customer 
were actually negotiating about at the end of the simula- 
tion, irrespective of whether that end was caused by the 
98% exogenous break-up probability, or by the fact that a 
deal was reached in the negotiation. The rows for 'percent- 
age' and 'rel(ative) percentage' present the same results in 
a different way: 'percentage' shows the shop's performance 
relative to the maximum attainable: 



percentage — 



(gains fcfln 



mm. gams) 



(max. gains — min. gains) ' 



whereas 'relative percentage' takes into account the starting 
bundle feinit: 

, ^. ^ (gains bflnai - gains binit) 

relative percentage — -. ; r. 

(max. gams — gams Oinit) 

Again, in both these rows, as in all the rows more generally, 
the S'-system outperforms the /x-system, which beats the B- 
system, but bear in mind the challenge for the /x-system, 
as compared to the 5-system, in terms of (dealing with the 
lack of) available aggregate knowledge. The rows labeled 
'rounds' and 'deals' give the average number of rounds it 
took to reach a deal (whenever a deal was reached) and the 



average number of deals reached. An observation that can 
be made is that the shop seems to do better (in terms of 
gains from trade) when the customers use the TDF-strategy 
than when they use tftm, although in the former case the 
shop requires a higher number of rounds to reach deals, and 
reaches less deals, as compared to the latter case. 

Figures 121 and 121 illustrate the shop's learning process when 
using the ^-system. The graph on the left in Figure |51 shows 
(the 100-step moving average of) the 'relative percentage' 
from Table measured at the end of the negotiation with 
each of the 12, 000 customers, and averaged over the 10 dif- 
ferent preference-distributions. The increase over time, of 
the shop's aggregate knowledge of his customer's preferences 
is clearly visible, for both strategies used by the customers. 

The graph on the right in Figure (21 shows the difference 
of these results between the p- and the 5'-systems, respec- 
tively; the S'-system does better, but the /x-system closes the 
gap as it learns more about its customers. Significantly, the 
difference between the plots for TDF and tftm disappears, 
indicating the robustness of the /i-system to changes in the 
customers' negotiation strategies. So the /i-system is clearly 
able to learn customers' preferences online, irrespective of 
the negotiation strategy used by those customers. However, 
the overall performance of the shop using the /x-system to- 
gether with his own negotiation strategy, is dependent upon 
the customer's negotiation strategy. More specifically there 
is a trade-off in performance. Compared to tftm, interact- 
ing with customers using TDF results in higher gains from 
trade, less deals (see also Figure l^l, and more rounds to 
reach those deals. The explanation for these differences is 
that with TFTM customers will give in quicker; whenever 
the shop suggests a good alternative the amount the cus- 
tomer concedes equals the gains from trade plus the amount 
conceded by the shop (perceived by the customer). Conse- 
quently deals are reached more quickly. This also results 
in more deals being reached but goes at the expense of the 
gains from trade because the search process is shorter. 



7. CONCLUSIONS AND FUTURE WORK 

In this paper, we consider a form of multi-issue negotiation 
where a shop negotiates both the contents and the price 
of bundles of goods with his customers. To facilitate the 
negotiations of a shop, we develop a procedure that uses 
aggregate (anonymous) knowledge about many customers 
in bilateral negotiations of bundle-price combinations with 
individual customers. By online interpreting customers' re- 
sponses to the shop's proposals for negotiating about alter- 
native bundles, the procedure acquires the desired aggregate 
knowledge online; it requires no a priori information while 
respecting customers' privacy. 

We conduct computer experiments with simulated customers 
that have nonlinear preferences. We compare our new ap- 
proach of having no a priori information and learning about 
customer preferences online, to the one where — for example 
because of expert knowledge — the shop already knows the 
underlying joint probability distribution of all bundle valu- 
ations. The latter approach is also the one discussed and 
experimented with in more detail in |18| . Our experiments 
show how, over time, the performance of our procedure ap- 
proaches that of our previous procedure, which already pos- 
sesses the necessary aggregate knowledge. Both procedures 
significantly increase the speed with which deals are reached, 
as well as the number and the Pareto efficiency of the deals 
reached, as compared to a benchmark. Moreover, the ex- 
periments show that the new procedure is able to learn the 
necessary information online, irrespective of the negotiation 
strategy used by the customers. 
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